Skip to content

Conversation

@tangxifan
Copy link
Contributor

@tangxifan tangxifan commented Jul 19, 2021

Description

This PR focuses on updating routing resource graph builder functions, where we use the refactored data structure RRGraphBuilder to replace the legacy data structure rr_node_indices.
This PR aims to eliminate the get_rr_node_indices() functions in the RRGraph builders, as one step further in deprecating the legacy data structure.

After this PR, the rr_node_indices data structure is only used in the verify_rr_node_indices() function:

bool verify_rr_node_indices(const DeviceGrid& grid, const t_rr_node_indices& rr_node_indices, const t_rr_graph_storage& rr_nodes) {
std::unordered_map<int, int> rr_node_counts;
auto& device_ctx = g_vpr_ctx.device();
const auto& rr_graph = device_ctx.rr_graph;

The verify_rr_node_indices() may be an API of the RRGraphBuilder data structure, since it is a validator.

Checklist:

  • Added comments to API add_nodes_at_all_locs() as requested in PR Deploy RRGraphBuilder in RRGraph Reader and Writer to replace the use of rr_node_indices #1800
  • Added a new API find_grid_nodes_at_all_sides() to RRSpatialLookup and remove API find_sink_nodes()
  • Deprecate the get_rr_node_indices() functions
  • Remove the t_opin_connection_scratchpad and use local variables instead
  • Improve memory efficiency in RRSpatialLookup APIs (A lot of rework still needed)
  • Remove the use of scratchpad in timing-driver placer lookup builder to save memory.

Related Issue

Motivation and Context

This pull request is a follow-up PR on the routing resource graph refactoring effort #1801

How Has This Been Tested?

After the previous PR #1801 , we start reworking all the source files that use the legacy data structure rr_node_indices in a high priority, in order to deprecate the legacy data structure as soon as possible.
Current statistics on the files that use rr_node_indices (in total there are 143 lines related):

  • ./route/router_lookahead_map_utils.cpp
  • ./route/rr_graph.cpp
  • ./route/rr_graph2.cpp
  • ./route/rr_graph2.h

This PR will remove the use in

  • ./route/router_lookahead_map_utils.cpp
  • ./route/rr_graph.cpp

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@github-actions github-actions bot added the VPR VPR FPGA Placement & Routing Tool label Jul 19, 2021
@tangxifan
Copy link
Contributor Author

@vaughnbetz @hzeller This PR is ready for your review.

@tangxifan tangxifan requested review from hzeller and vaughnbetz July 19, 2021 15:06
RRSpatialLookup& node_lookup();
/* Add an existing rr_node in the node storage to the node look-up
* The node will be added to the lookup for every side it is on (for OPINs and IPINs)
* and for every (x,y) location at which it exists (for wires that span more than one (x,y).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typo (probably in my original comment -- stray bracket in "(for"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my careless comments. Fixed now.

Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like some of these data structures (the "label" ones for example) are really storing track numbers, not RRNodeIds. Hence I think some of the vector type changes are incorrect/reduce clarity. Please take a look at the detailed comments and see if you agree.

If they are indeed RRNodeId vectors, then we should still make a change to avoid RRNodeId(UN_SET) and instead use RRNodeId::Invalid().

}

if ((*incoming_wire_label[side_cw])[itrack] != UN_SET) {
if ((*incoming_wire_label[side_cw])[itrack] != RRNodeId(UN_SET)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast looks strange to me. Is this data structure storing an rr_node index (RRNodeId) or an integer that represents something else? If it is storing an RRNodeId, it seems like we should get rid of UN_SET and instead use RRNodeID::Invalid as the sentinel value for not set yet (and update the commenting to match).

Right now both RRNodeId::Invalid() and UN_SET are -1 I think, so it will all work, but it seems strange to define a separate UN_SET invalid sentinel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this data structure is indeed storing an RRNodeId it would be good to add a comment to that effect in this routine and in the data structure definition (unless there is one already, but I didn't find it). From a quick look at the code I couldn't quite figure out if this incoming_wire_label data structure is storing some kind of track index, or if it's a unique rr_node id. If you know the answer to that Xifan, it would be good to add a comment now.

* If seg_type_index == UNDEFINED, all segments in the channel are considered. Otherwise this routine
* only looks at segments that belong to the specified segment type. */

std::vector<int>& labels = *labels_ptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these "labels" actually RRNodeIDs, or are they track numbers, or something else? Depending on the answer, they should stay ints, or be converted to RRNodeIds as this PR does. The comment should be updated to explain what a label is (is it an RRNodeId, a track number, or something else?


/* Alloc the list of labels for the tracks */
labels.resize(max_chan_width);
std::fill(labels.begin(), labels.end(), UN_SET);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on UN_SET -- seems like it should be RRNodeId::Invalid() if this is actually storing an rr_node index. And if it isn't, we should change the type of the vector back to int.

RRNodeId max_track = RRNodeId::INVALID();

for (int i = 0; i < num_wire_muxes; i++) {
if (wire_mux_on_track[i] == from_track) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like label is a track number here (from 0 to W-1) so this implies "labels" are really "track numbers" and should be kept as ints, I think.


typedef vtr::NdMatrix<short, 6> t_sblock_pattern;

struct t_opin_connections_scratchpad {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to comment what this is for, if you know. Why a dimension of 8?
Does it store RRNodeIds, or is it some tile structure that stores track numbers or some such (which would be better left as an int)?

@tangxifan
Copy link
Contributor Author

Hi @vaughnbetz
Thanks for the constructive comments. I have the same feeling on the t_opin_connections_scratchpad when doing the refactoring.

My thoughts

After give a in-depth read on the codes, I think you are right.

  • The data type of the t_opin_connections_scratchpad should be int rather than RRNodeId. It is mainly used in the codebase to store the track index (0 ... max_chan_width - 1) rather than a valid node id.
  • It is weird that in some functions build_rr_sinks_sources() and get_opin_direct_connections(), it is used to store node ids which are required to create edges

To summarize, in current codes, its usage is mixed.

  • Why the t_opin_connections_scratchpad has 8 dimensions: I checked the codes, it seems that it contains two groups and each group contains 4 sides, representing the 4 sides of a switch block.

VTR_ASSERT(scratchpad->scratch.size() == NUM_SIDES * 2);

The first group denotes the output ports of a switch block

wire_mux_on_track[side] = &scratchpad->scratch[side];

The second group denotes the incoming wires, which are the input ports of a switch block

incoming_wire_label[side] = &scratchpad->scratch[NUM_SIDES + side];

Detailed Analysis

The t_opin_connections_scratchpad is used in two code blocks:

  • The code block that creates unidirectional switch block patterns before allocating rr_graph:

if (is_global_graph) {
switch_block_conn = alloc_and_load_switch_block_conn(1, SUBSET, 3);
} else if (BI_DIRECTIONAL == directionality) {
if (sb_type == CUSTOM) {
sb_conn_map = alloc_and_load_switchblock_permutations(chan_details_x, chan_details_y,
grid,
switchblocks, &nodes_per_chan, directionality,
switchpoint_rand_state);
} else {
switch_block_conn = alloc_and_load_switch_block_conn(max_chan_width, sb_type, Fs);
}
} else {
VTR_ASSERT(UNI_DIRECTIONAL == directionality);
if (sb_type == CUSTOM) {
sb_conn_map = alloc_and_load_switchblock_permutations(chan_details_x, chan_details_y,
grid,
switchblocks, &nodes_per_chan, directionality,
switchpoint_rand_state);
} else {
/* it looks like we get unbalanced muxing from this switch block code with Fs > 3 */
VTR_ASSERT(Fs == 3);
t_opin_connections_scratchpad scratchpad;
unidir_sb_pattern = alloc_sblock_pattern_lookup(grid, max_chan_width);
for (size_t i = 0; i < grid.width() - 1; i++) {
for (size_t j = 0; j < grid.height() - 1; j++) {
load_sblock_pattern_lookup(i, j, grid, &nodes_per_chan,
chan_details_x, chan_details_y,
Fs, sb_type, unidir_sb_pattern,
&scratchpad);
}
}
if (getEchoEnabled() && isEchoFileEnabled(E_ECHO_SBLOCK_PATTERN)) {
dump_sblock_pattern(unidir_sb_pattern, max_chan_width, grid,
getEchoFileName(E_ECHO_SBLOCK_PATTERN));
}
}
}

  • The code block that creates edges for unidirectional wires/OPINs when allocating rr_graph:

t_opin_connections_scratchpad scratchpad;
/* If Fc gets clipped, this will be flagged to true */
*Fc_clipped = false;
/* Connection SINKS and SOURCES to their pins. */
for (size_t i = 0; i < grid.width(); ++i) {
for (size_t j = 0; j < grid.height(); ++j) {
build_rr_sinks_sources(rr_graph_builder, i, j, L_rr_node, rr_edges_to_create, L_rr_node_indices,
delayless_switch, grid, &scratchpad);
//Create the actual SOURCE->OPIN, IPIN->SINK edges
uniquify_edges(rr_edges_to_create);
alloc_and_load_edges(L_rr_node, rr_edges_to_create);
rr_edges_to_create.clear();
}
}

The following functions uses the scratch pad

  • build_rr_graph()
    • load_sblock_pattern_lookup()
      • label_incoming_wires() This function always reset the scratchpad and refill
      • label_wire_muxes() This function always reset the scratchpad and refill
    • alloc_and_load_rr_graph()
      • build_rr_sinks_sources() This function only assigns some RRNodeIds but never used later
      • build_bidir_rr_opins()
      • get_opin_direct_connections() This function always reset the 1st dimension of scratchpad and refill with some RRNodeIds. The node ids are used to create edges.
      • build_unidir_rr_opins()
        • get_unidir_opin_connections()
          • label_wire_muxes() This function always reset the scratchpad and refill
        • get_opin_direct_connections() This function always reset the 1st dimension of scratchpad and refill with some RRNodeIds. The node ids are used to create edges.
    • build_rr_chan()
      • get_track_to_tracks()
        • get_unidir_track_to_chan_seg()
          • label_wire_muxes() This function always reset the 1st and 2nd dimensions of scratchpad and refill

Action items

I think that the scratchpad creates a lot of mess between functions.

  • Different functions use it for different data types.
  • It is always reset and refilled in functions. Previous results are not used. It means that it is not used to exchange data between functions. It is indeed a scratchpad.

Actually, my opinion in the refactoring

  • As scratchpad does not help in exchanging data. It should be a local variable in these functions
  • The scratchpad should use int as data type because it is used in switch block pattern generation.
  • The scratchpad in build_rr_sinks_sources() and get_opin_direct_connections() should be removed and replaced with a local vector of RRNodeId.

Let me know what you think. We can converge on the action items. I can do refactoring accordingly.

@vaughnbetz
Copy link
Contributor

Thanks for the detailed analysis Xifan. I agree with your proposal -- making these local variables of the right type seems like the best approach.

…ded comments to clarify the use of scratchpad
@tangxifan
Copy link
Contributor Author

@vaughnbetz Thanks for the input. I have remove the use of t_opin_connection_scratchpad as input arguments of functions, and also added comments to the t_opin_connection_scratchpad based on the analysis.

The PR is ready for your review.

@tangxifan
Copy link
Contributor Author

Do not know why the sanity basic tests failed. I am looking into the problems. I will ping you when the CI is green. Then it is truely ready for code review.

@vaughnbetz
Copy link
Contributor

Maybe without the scratchpads we're doing huge amounts of memory traffic and memory fragmentation?

@tangxifan
Copy link
Contributor Author

Maybe without the scratchpads we're doing huge amounts of memory traffic and memory fragmentation?

Yes. I checked the log files in the basic regression tests in sanity mode. It explodes the RAM size. In some tests, the peak memory usage is 6Gb, which caused the CI runner abort.

However, the scatchpad is only called in rr_graph builder but we did not see a sharp increase in the memory.
The placer really eats a lot of memory, far more than the rr_graph and router lookahead map.
I have tried to reproduce the error today with a Ubuntu machine. But it is weird that I cannot reproduce the error/memory usage on my machine.

I am trying to find out a solution about how to reduce the peak memory usage when debug mode is on.

Attached some lines from log file from https://github.com/verilog-to-routing/vtr-verilog-to-routing/pull/1805/checks?check_run_id=3118949531

<## Placement Quench took 0.40 seconds (max_rss 3763.9 MiB)
	<
	<BB estimate of min-dist (placement) wire length: 602
	<
	<Completed placement consistency check successfully.
	<
	<Swaps called: 154126
	<
	<Aborted Move Reasons:
	<  No moves aborted
	<Placement cost: 6.02316, bb_cost: 6.02316, td_cost: nan, 
	<
	<Placement resource usage:
	<  io     implemented as io    : 229
	<  clb    implemented as clb   : 72
	<  memory implemented as memory: 1
	<
	<Placement number of temperatures: 152
	<Placement total # of swap attempts: 154126
	<	Swaps accepted:  78707 (51.1 %)
	<	Swaps rejected:  75419 (48.9 %)
	<	Swaps aborted :      0 ( 0.0 %)
	<
	<
	<Percentage of different move types:
	<	Uniform move: 100.00 % (acc=51.07 %, rej=48.93 %, aborted=0.00 %)
	<	W. Centroid move: 0.00 % (acc=100.00 %, rej=0.00 %, aborted=0.00 %)
	<
	<Placement Quench timing analysis took 0 seconds (0 STA, 0 slack) (0 full updates: 0 setup, 0 hold, 0 combined).
	<Placement Total  timing analysis took 0 seconds (0 STA, 0 slack) (0 full updates: 0 setup, 0 hold, 0 combined).
	<update_td_costs: connections 0 nets 0 sum_nets 0 total 0
	<# Placement took 48.34 seconds (max_rss 3764.1 MiB, delta_rss +3344.4 MiB)
	<
	<# Routing
	<Initializing minimum channel width search using specified hint
	<
	<Attempting to route at 36 channels (binary search bounds: [-1, -1])
	<## Build routing resource graph
	<## Build routing resource graph took 4.63 seconds (max_rss 4229.8 MiB, delta_rss +463.2 MiB)
	<  RR Graph Nodes: 11337
	<  RR Graph Edges: 46620
	<Confirming router algorithm: TIMING_DRIVEN.
	<## Computing router lookahead map
	<### Computing wire lookahead
	<### Computing wire lookahead took 5.35 seconds (max_rss 4785.5 MiB, delta_rss +512.2 MiB)
	<### Computing src/opin lookahead
	<### Computing src/opin lookahead took 0.03 seconds (max_rss 4787.9 MiB, delta_rss +2.3 MiB)
	<## Computing router lookahead map took 5.38 seconds (max_rss 4787.9 MiB, delta_rss +514.6 MiB)

@vaughnbetz
Copy link
Contributor

The placer builds the rr-graph to compute the place delay matrix (uses a router-like algorithm) so the rr-graph would be built there.
I suspect that's the largest memory use of the placer.

@tangxifan
Copy link
Contributor Author

The placer builds the rr-graph to compute the place delay matrix (uses a router-like algorithm) so the rr-graph would be built there.
I suspect that's the largest memory use of the placer.

Yes. I am checking my previous PRs. I remember I did modify a function in placer, which may be source of these mess.

@tangxifan
Copy link
Contributor Author

A todo list as a reminder before PR can be merged. Upload QoR comparison between the current branch and the VTR before refactoring (will create a branch based on current master and revert back a number of commits) on the following tests:

@tangxifan
Copy link
Contributor Author

Just tried the titian benchmarks on this PR. The gaussianblur benchmarks cannot be routed. I am going to run the titan benchmarks on an old master (before refactoring happens), with an aim to spot which PR caused the problem.

Attached the log file:

==========================================================================
                  Verilog-to-Routing Regression Testing
==========================================================================
           Running vtr_reg_weekly
--------------------------------------------
scripts/run_vtr_task.py -l /research/ece/lnis/USERS/tang/github/vtr-verilog-to-routing/vtr_flow/tasks/regression_tests/vtr_reg_weekly/task_list.txt -j 23 -script run_vtr_flow.py -short_task_names 

stratixiv_arch.timing/neuron_stratixiv_arch_timing		OK (took 1053.81 seconds)
stratixiv_arch.timing/stereo_vision_stratixiv_arch_timing		OK (took 1099.90 seconds)
stratixiv_arch.timing/sparcT1_core_stratixiv_arch_timing		OK (took 1616.09 seconds)
stratixiv_arch.timing/cholesky_mc_stratixiv_arch_timing		OK (took 1633.08 seconds)
stratixiv_arch.timing/SLAM_spheric_stratixiv_arch_timing		OK (took 2493.78 seconds)
stratixiv_arch.timing/des90_stratixiv_arch_timing		OK (took 3111.53 seconds)
stratixiv_arch.timing/dart_stratixiv_arch_timing		OK (took 3653.31 seconds)
stratixiv_arch.timing/segmentation_stratixiv_arch_timing		OK (took 4292.70 seconds)
stratixiv_arch.timing/openCV_stratixiv_arch_timing		OK (took 5096.52 seconds)
stratixiv_arch.timing/cholesky_bdti_stratixiv_arch_timing		OK (took 5212.57 seconds)
stratixiv_arch.timing/minres_stratixiv_arch_timing		OK (took 5303.49 seconds)
stratixiv_arch.timing/stap_qrd_stratixiv_arch_timing		OK (took 6374.38 seconds)
stratixiv_arch.timing/bitonic_mesh_stratixiv_arch_timing		OK (took 6379.55 seconds)
stratixiv_arch.timing/sparcT2_core_stratixiv_arch_timing		OK (took 8691.24 seconds)
stratixiv_arch.timing/denoise_stratixiv_arch_timing		OK (took 10788.75 seconds)
stratixiv_arch.timing/gsm_switch_stratixiv_arch_timing		OK (took 12355.49 seconds)
stratixiv_arch.timing/mes_noc_stratixiv_arch_timing		OK (took 16831.46 seconds)
stratixiv_arch.timing/LU_Network_stratixiv_arch_timing		OK (took 18483.60 seconds)
stratixiv_arch.timing/sparcT1_chip2_stratixiv_arch_timing		OK (took 22005.97 seconds)
stratixiv_arch.timing/bitcoin_miner_stratixiv_arch_timing		OK (took 38720.83 seconds)
stratixiv_arch.timing/directrf_stratixiv_arch_timing		OK (took 44879.60 seconds)
stratixiv_arch.timing/LU230_stratixiv_arch_timing		OK (took 85939.03 seconds)
stratixiv_arch.timing/gaussianblur_stratixiv_arch_timing		Error: Executable vpr failed
	full command:  /usr/bin/env time -v /research/ece/lnis/USERS/tang/github/vtr-verilog-to-routing/vpr/vpr stratixiv_arch.timing.xml gaussianblur_stratixiv_arch_timing --circuit_file gaussianblur_stratixiv_arch_timing.pre-vpr.blif --route_chan_width 300 --max_router_iterations 400 --router_lookahead map --inner_num 2 --astar_fac 1.0 --sdc_file /research/ece/lnis/USERS/tang/github/vtr-verilog-to-routing/vtr_flow/benchmarks/titan_blif/gaussianblur_stratixiv_arch_timing.sdc
	returncode  :  -15
	log file    :  /research/ece/lnis/USERS/tang/github/vtr-verilog-to-routing/vtr_flow/tasks/regression_tests/vtr_reg_weekly/vtr_reg_titan_he/run001/stratixiv_arch.timing.xml/gaussianblur_stratixiv_arch_timing.blif/common/vpr.out
failed: Executable vpr failed (took 448194.47 seconds)
Elapsed time: 448194.79 seconds

Parsing test results...
scripts/parse_vtr_task.py -l /research/ece/lnis/USERS/tang/github/vtr-verilog-to-routing/vtr_flow/tasks/regression_tests/vtr_reg_weekly/task_list.txt
Elapsed time: 6.55 seconds

Calculating QoR results...

regression_tests/vtr_reg_weekly/vtr_reg_titan_he...[Fail]
[Fail]
stratixiv_arch.timing.xml/gaussianblur_stratixiv_arch_timing.blif/common vpr_status Task value 'exited with return code -15' does not match golden 'success'
[Fail]
stratixiv_arch.timing.xml/gaussianblur_stratixiv_arch_timing.blif/common logic_block_area_total relative value inf outside of range [0.8,1.3] and not equal to golden value: 0.0
[Fail]
stratixiv_arch.timing.xml/gaussianblur_stratixiv_arch_timing.blif/common logic_block_area_used relative value inf outside of range [0.8,1.3] and not equal to golden value: 0.0
[Fail]
stratixiv_arch.timing.xml/gaussianblur_stratixiv_arch_timing.blif/common routing_area_total relative value -4.2460501118834204e-10 outside of range [0.8,1.3] and not equal to golden value: 2355130000.0
[Fail]
stratixiv_arch.timing.xml/gaussianblur_stratixiv_arch_timing.blif/common routing_area_per_tile relative value -4.9558925562493803e-05 outside of range [0.8,1.3] and not equal to golden value: 20178.0
[Fail]
stratixiv_arch.timing.xml/gaussianblur_stratixiv_arch_timing.blif/common crit_path_route_time relative value 156.82601085663865 outside of range [0.1,10.0], above absolute threshold 2.0 and not equal to golden value: 2085.36
[Fail]
stratixiv_arch.timing.xml/gaussianblur_stratixiv_arch_timing.blif/common max_vpr_mem relative value -3.3827119066046776e-08 outside of range [0.8,1.2], above absolute threshold 102400.0 and not equal to golden value: 29562080.0
[Fail]
stratixiv_arch.timing.xml/gaussianblur_stratixiv_arch_timing.blif/common critical_path_delay relative value -0.0010828030305491218 outside of range [0.5,1.4] and not equal to golden value: 923.529
[Fail]
stratixiv_arch.timing.xml/gaussianblur_stratixiv_arch_timing.blif/common geomean_nonvirtual_intradomain_critical_path_delay relative value -0.0010828030305491218 outside of range [0.5,1.4] and not equal to golden value: 923.529
[Fail]
stratixiv_arch.timing.xml/gaussianblur_stratixiv_arch_timing.blif/common setup_TNS relative value 7.812622072219879e-09 outside of range [0.5,1.4] and not equal to golden value: -127998000.0
[Fail]
stratixiv_arch.timing.xml/gaussianblur_stratixiv_arch_timing.blif/common setup_WNS relative value 0.0010839767638740896 outside of range [0.5,1.4] and not equal to golden value: -922.529

regression_tests/vtr_reg_weekly/vtr_reg_titan_he...[Fail]
[Fail]
stratixiv_arch.timing.xml/LU_Network_stratixiv_arch_timing.blif/common placed_CPD_est relative value 1.4130971947890367 outside of range [0.5,1.4] and not equal to golden value: 6.33357
[Fail]
stratixiv_arch.timing.xml/LU_Network_stratixiv_arch_timing.blif/common placed_setup_WNS_est relative value 1.4905494818667422 outside of range [0.5,1.4] and not equal to golden value: -5.33357
[Fail]
stratixiv_arch.timing.xml/LU_Network_stratixiv_arch_timing.blif/common critical_path_delay relative value 1.4059740225349875 outside of range [0.5,1.4] and not equal to golden value: 6.79743
[Fail]
stratixiv_arch.timing.xml/LU_Network_stratixiv_arch_timing.blif/common setup_WNS relative value 1.476000572667544 outside of range [0.5,1.4] and not equal to golden value: -5.79743

Test 'vtr_reg_weekly' had 15 qor test failures

Test 'vtr_reg_weekly' had 1 run failures

Error: 16 tests failed

@vaughnbetz
Copy link
Contributor

@tangxifan : seems like two issues:

  • LU_network slowed down by 41% (outside 40% critical path tolerance).
  • Gaussian_blur failed to route. I think gaussian_blur can sometimes fail to route, so this may be seed noise.
    Probably worth running a different seed to see if it passes, and if LU_Network is within the QoR bounds, as well.

@tangxifan
Copy link
Contributor Author

@vaughnbetz Got it. Let me try to rerun and if the seed noise is indeed a problem. I will keep you posted.

Meanwhile, I have finished the QoR test on the VTR benchmark.
A short summary

  • No critical path delay degradation (as expected)
  • No change in peak memory usage
  • Reduced total flow runtime but see an average of 1-4% increase in runtime for placer and router (In most benchmarks, the place&route runtime is improved. The only glitch is on the sha with a 40% increase in runtime on place&route).

vtr_reg_qor_chain_depop_comp.xlsx

@tangxifan
Copy link
Contributor Author

Just finished the QoR check on the titan_quick_qor test case

A short summary

  • No critical path delay changes
  • On average 1% reduction in peak memory usage after refactoring
  • On average 4% runtime increase (3% on pack time and 1% on place time)

Details can be found in the attached spreadsheet

vtr_reg_titan_quick_qor_comp.xlsx

@tangxifan
Copy link
Contributor Author

tangxifan commented Aug 14, 2021

Finished the basic sanity tests.
A short summary

  • The peak memory usage is increased by 3% on average.
  • Runtime is increased by 1% on average
  • The peak memory usage for the two biggest design ch_instrinsics and diffeq1 is 486MB and 627MB respectively. They are in range.

Details can be found in the attached spreadsheet

vtr_reg_basic_timing_sanity_comp.xlsx

@tangxifan
Copy link
Contributor Author

@vaughnbetz As suggested, I have completed the QoR checks on Titan, VtR and sanity basic benchmarking. No changes on peak memory usage. Small changes on runtime is observed. See if it is good to go.

@vaughnbetz
Copy link
Contributor

Looks good; thanks @tangxifan . LU_Network didn't finish on the titan_quick_qor_test, but since it is tested in CI and CI is green, that must be a transient issue.

@vaughnbetz vaughnbetz merged commit 1a00ea9 into master Aug 15, 2021
@vaughnbetz vaughnbetz deleted the get_rr_node_indices branch August 15, 2021 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

VPR VPR FPGA Placement & Routing Tool

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants